P Values are not Error Probabilities
نویسندگان
چکیده
Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers. The confusion stems from the fact that most of these researchers are unaware of the historical development of classical statistical testing methods, and the mathematical and philosophical principles underlying them. Moreover, researchers erroneously believe that the interpretation of such tests is prescribed by a single coherent theory of statistical inference. This is not the case: Classical statistical testing is an anonymous hybrid of the competing and frequently contradictory approaches formulated by R.A. Fisher on the one hand, and Jerzy Neyman and Egon Pearson on the other. In particular, there is a widespread failure to appreciate the incompatibility of Fisher’s evidential p value with the Type I error rate, α, of Neyman–Pearson statistical orthodoxy. The distinction between evidence (p’s) and error (α’s) is not trivial. Instead, it reflects the fundamental differences between Fisher’s ideas on significance testing and inductive inference, and Neyman–Pearson views of hypothesis testing and inductive behavior. Unfortunately, statistics textbooks tend to inadvertently cobble together elements from both of these schools of thought, thereby perpetuating the confusion. So complete is this misunderstanding over measures of evidence versus error that is not viewed as even being a problem among the vast majority of researchers. The upshot is that despite supplanting Fisher’s significance testing paradigm some fifty years or so ago, recognizable applications of Neyman–Pearson theory are few and far between in empirical work. In contrast, Fisher’s influence remains pervasive. Professional statisticians must adopt a leading role in lowering confusion levels by encouraging textbook authors to explicitly address the differences between Fisherian and Neyman–Pearson statistical testing frameworks.
منابع مشابه
Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions.
Multiple comparison procedures that control a family-wise error rate or false discovery rate provide an achieved error rate as the adjusted p-value or q-value for each hypothesis tested. However, since achieved error rates are not understood as probabilities that the null hypotheses are true, empirical Bayes methods have been employed to estimate such posterior probabilities, called local false...
متن کاملEstimating parameters for probabilistic linkage of privacy-preserved datasets
BACKGROUND Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved da...
متن کاملQuestioning Human Error Probabilities in Railways
Human errors are regarded as one of the main causes for railway accidents these days. In spite of this fact, the consideration of human error probabilities in quantified risk analyses has been very rudimentary up to now. A lack of comprehensive data and analyses in literature lead to the use of estimations and values from other industries. This paper discusses the transferability of human error...
متن کاملEvaluation of standard error and confidence interval of estimated multilocus genotype probabilities, and their implications in DNA forensics.
Multilocus genotype probabilities, estimated using the assumption of independent association of alleles within and across loci, are subject to sampling fluctuation, since allele frequencies used in such computations are derived from samples drawn from a population. We derive exact sampling variances of estimated genotype probabilities and provide simple approximation of sampling variances. Comp...
متن کاملCould Fisher, Jeffreys and Neyman Have Agreed on Testing?
Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions. This article focuses on discu...
متن کاملASSOCIATED PROBABILITY INTUITIONISTIC FUZZY WEIGHTED OPERATORS IN BUSINESS START-UP DECISION MAKING
In the study, we propose the Associated Probability Intuitionistic Fuzzy Weighted Averaging (As-P-IFWA) and the Associated Probability Intuitionistic Fuzzy Weighted Geometric (As-P-IFWG) aggregation operators with associated probabilities of a fuzzy measure presenting an uncertainty. Decision makers' evaluations are given as intuitionistic fuzzy values and are used as the arguments of the aggre...
متن کامل